Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate library scripts to ingest - FacDB #1313

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Conversation

fvankrieken
Copy link
Contributor

@fvankrieken fvankrieken commented Dec 10, 2024

Closes #1290

Commit 1 has a big tweak - basically, adds option to cast varchar fields from library to bigint on comparison. This should be done, in my opinion, in one case only.

  • these columns are casted to int at the beginning of our builds
  • as part of validation/migration, downstream code changes are made to take advantage of the new, more "accurate" types.

Otherwise, this should not be done as part of the comparison when migrating from library to ingest

for example, comparing nysed_nonpublicenrollment, originally I got this

ProgrammingError: (psycopg2.errors.DatatypeMismatch) EXCEPT types character 
varying and bigint cannot be matched

but now running lifecycle scripts validate_ingest compare nysed_nonpublicenrollment --c2n prek --c2n halfk --c2n fullk --c2n gr1 --c2n gr2 --c2n gr3 --c2n gr4 --c2n gr5 --c2n gr6 --c2n gr7 --c2n gr8 --c2n gr9 --c2n gr10 --c2n gr11 --c2n gr12 --c2n institution_id --c2n beds_code --c2n ugs --c2n uge (a bit verbose but I think forcing intentionality is good when this is sort of twisting the validation) I get

________________________________________________________________________________
Tables
    Left: nysed_nonpublicenrollment_library
    Right: nysed_nonpublicenrollment_ingest
________________________________________________________________________________
Row count
    Left: 1822
    Right: 1822
________________________________________________________________________________
Column comparison
    Both
        affliation
        beds_code
        county
        data_library_version
        fullk
        gr1
        gr10
        gr11
        gr12
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        halfk
        institution_id
        ogc_fid
        prek
        school_name
        school_year
        uge
        ugs
    Left only: None
    Right only: None
    Type differences
        Halfk
            Left: character varying
            Right: bigint
        Institution id
            Left: character varying
            Right: bigint
        Gr11
            Left: character varying
            Right: bigint
        Beds code
            Left: character varying
            Right: bigint
        Gr8
            Left: character varying
            Right: bigint
        Gr6
            Left: character varying
            Right: bigint
        Prek
            Left: character varying
            Right: bigint
        Uge
            Left: character varying
            Right: bigint
        Gr7
            Left: character varying
            Right: bigint
        Gr2
            Left: character varying
            Right: bigint
        Gr9
            Left: character varying
            Right: bigint
        Gr10
            Left: character varying
            Right: bigint
        Gr1
            Left: character varying
            Right: bigint
        Gr3
            Left: character varying
            Right: bigint
        Affliation
            Left: character varying
            Right: text
        Gr12
            Left: character varying
            Right: bigint
        Ugs
            Left: character varying
            Right: bigint
        Gr5
            Left: character varying
            Right: bigint
        School name
            Left: character varying
            Right: text
        Gr4
            Left: character varying
            Right: bigint
        School year
            Left: character varying
            Right: text
        County
            Left: character varying
            Right: text
        Fullk
            Left: character varying
            Right: bigint
________________________________________________________________________________
Data comparison
    Compared columns
        affliation
        beds_code
        county
        fullk
        gr1
        gr10
        gr11
        gr12
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        halfk
        institution_id
        prek
        school_name
        school_year
        uge
        ugs
    Ignored columns
        ogc_fid
        data_library_version
    Columns coerced to numeric
        prek
        halfk
        fullk
        gr1
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        gr10
        gr11
        gr12
        institution_id
        beds_code
        ugs
        uge
    Left only
        Empty DataFrame
        Columns: [halfk, institution_id, gr11, beds_code, gr8, gr6, prek, uge, gr7, gr2, gr9, gr10, gr1, gr3, affliation, gr12, ugs, gr5, school_name, gr4, school_year, county, fullk]
        Index: []
    Right only
        Empty DataFrame
        Columns: [halfk, institution_id, gr11, beds_code, gr8, gr6, prek, uge, gr7, gr2, gr9, gr10, gr1, gr3, affliation, gr12, ugs, gr5, school_name, gr4, school_year, county, fullk]
        Index: []

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 28.00000% with 18 lines in your changes missing coverage. Please review.

Project coverage is 71.19%. Comparing base (a041b06) to head (82ace91).

Files with missing lines Patch % Lines
dcpy/lifecycle/scripts/validate_ingest.py 0.00% 11 Missing ⚠️
dcpy/data/compare.py 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1313      +/-   ##
==========================================
+ Coverage   70.00%   71.19%   +1.18%     
==========================================
  Files         114      109       -5     
  Lines        6121     6026      -95     
  Branches      702      697       -5     
==========================================
+ Hits         4285     4290       +5     
+ Misses       1690     1590     -100     
  Partials      146      146              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fvankrieken fvankrieken force-pushed the fvk-ingest-facdb branch 15 times, most recently from 729baf7 to 288ef9d Compare December 18, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

Ingest Migration - FacDB script sources
1 participant